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Abstract 

It is a well-known fact that the degree distribution (DD) of the nodes in a 
partition of a bipartite network influences the DD of its one-mode projection 
on that partition. However, there are no studies exploring the effect of the 
DD of the other partition on the one-mode projection. In this article, we show 
that the DD of the other partition, in fact, has a very strong influence on the 
DD of the one-mode projection. We establish this fact by deriving the exact 
or approximate closed-forms of the DD of the one-mode projection through the 
application of generating function formalism followed by the method of iterative 
convolution. The results are cross-validated through appropriate simulations. 
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1. Introduction 

A bipartite network consists of two partitions of nodes, say U and V, such 
that edges connect nodes from different partitions, but never those in the same 
partition. A one-mode projection of such a bipartite network onto C/ is a net- 
work consisting of the nodes in U; two nodes u and u' are connected in the 
one-mode projection, if and only if there exist a node v G V such that (u, v) 
and {u',v) are edges in the corresponding bipartite network. Many real-life 
networks are, in fact, one-mode projections of a more fundamental bipartite 
structure [J, [2| ■ As an example, consider the friendship and word co-occurrence 
networks. The former arises from the underlying bipartite relationship of the 
individual to different places (pubs, family, workplace etc.) because friendship 
groups evolve around certain social contexts (e.g., people regularly meeting in 
a pub, or colleagues at a workplace). The latter arise from an underlying word- 
sentence bipartite network. Therefore, for several real-world complex systems 
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as described in 0, 0, 0, [1] , understanding the underlying bipartite process turns 
out to be extremely important. 

In this bipartite process, there are precisely two components - (a) the at- 
tachment process i.e., how ties get formed between individuals or words (we 
shall refer to this partition as U) and different entities like pubs, workplaces 
or sentences (partition V), and (b) the size distribution of the entities in V, 
for instance, the number of words in a sentence or number of individuals at a 
workplace. The effect of the former is heavily studied in the literature and it 
is well-known that the attachment in real world bipartite networks is largely 
preferential in nature 0, IE S Nevertheless, the latter has not received 

much attention in the network community, even though the basic framework for 
computing the DD of the one-mode projection has been formulated long back 



llj. The popular but unrealistic assumption that the degree of the nodes in 



partition y is a constant results in networks whose one-mode projection onto U 
has a DD qualitatively identical to that of U in the bipartite network. Here we 
show that under a more realistic assumption where the degrees of the nodes in 

V are sampled from a distribution (which is not a constant), the DD of this one- 
mode projection is remarkably different from that of the DD of U. Our analysis 
reveals that the dependence of the DD of the one-mode projection on the DD of 

V is so strong that even slight relaxation of the "constant degree" assumption, 
for instance if the DD of V is peaked (normal and exponential distributions), 
leads to significantly different results. 

The generating function (GF) formalism introduced in [llj presents open 
equations of the one-mode DD and therefore it is difficult to derive a meaningful 
insight from these equations. The main contribution of this work lies in the 
derivation of the closed-forms for the DD of the one-mode projection under 
some realistic assumptions. We used the process of iterative convolution to 
arrive at our results, which enabled us to analytically study the influence of the 
DD of the partition V, so long overlooked in the literature. The results have 
been cross-validated through appropriate simulations. 



2. Analysis of the degree distribution of the one-mode projection 

Formally, the one- mode projection considered here is a graph where Ui, Uj S 
U are connected by an edge if there exists a node v G V such that there is an 
edge between (a) Ui and v and (b) Uj and v in the bipartite network. If there 
are w such nodes in V which are connected to both m and Uj in the bipartite 
network, then there are w edges linking Ui and Uj in the one-mode projection. 
Alternatively, one can think of the one-mode projection as a weighted graph, 
where the weight of the edge {ui,Uj) is w. In the rest of the paper, we always 
consider the degree distribution of this weighted one-mode network. 

Let us assume that the degree of the nodes in partition V are sampled from 
a distribution fd with expected value /i. Let us denote the degree and DD of 
a node u € U as k and pk respectively in the bipartite network. Further, let q 
denote the degree of the nodes in the one-mode projection on U. Let us call the 
probability that the node u having degree k in the bipartite network ends up as 
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a node having degree q in the one- mode projection Fk{q). Also, let us denote 
the degree distribution of the nodes in U in the one-mode projection by Pu{q)- 
If we assume that the degrees of the k nodes in V to which u is connected to 
are di, d2, . . ., dk then we can write 

(1) 

1=1. ..fc 

The probability that the node u in the bipartite network is connected to a node 
in V of degree di is difdi, where i = 1 . . .k. At this point, one might apply 
the GF formalism [ll'| to calculate the degree distribution of the nodes in the 
one-mode projection as follows. Let f{x) = J2d fd-x'^ denote the GF for the 
distribution of the node degrees in V, p{x) — J2kPkx'^ denote the GF for the 
degree distribution of the nodes in U and denote the GF for p„ (q) then it 
is straightforward to see from eq. (70) of 'll' that, 

(2) 

On suitable expansion of eq. ^ we obtain 

Pu{q) ^^PkFkiq) (3) 

k 

or, 

Pu{q) ^^Pk ^^-^-^/di/d^ • ■ • /dfc (4) 

k di+d2-\-...-\-dk—k—q ^ 

For peaked distributions we can make the assumption that there will be a 
finite probability only when d ~ fi and /i 3> 1. Hence, di + d2+. ■ ■ + dk ~ fc/i 
which implies that the arithmetic mean is roughly equal to the geometric mean. 
Therefore, we have did2 ■ ■ - dk approximately equal to fJ^ . We shall shortly dis- 
cuss in further details the bounds of this approximation (section 12. 3p . However, 
prior to that, let us investigate, how this approximation helps in advancing our 
analysis. Under the assumption d,id2-^-dk _ Fk(q) can be thought of as the 
distribution of the sum of k random variables each sampled from fd- In other 
words, Fk{q) tells us how the sum of the k random variables is distributed if 
each of these individual random variables are drawn from the distribution fd- 
This distribution of the sum can be obtained by the iterative convolution of fd 
for k timefl If the closed form expression for the convolution exists for a distri- 
bution, then we can obtain an analytical expression for Pu{q)- In the following, 
we shall attempt to find an expression for pu (q) assuming three different forms 
of the distribution fd- As we shall see, Fk{q) is different for each of these forms, 
thereby, making the degree distribution of the nodes in the one-mode sensitive 
to the choice of fd- Since in the expression for q (eq. ([1])) we need to subtract 
one from each of the d.^ terms (i.e., each term is {di — 1) rather than di) therefore 
the mean of the distribution Fk (q) has to be shifted accordingly. 



^ Apart from some special cases, d/^ is hard to convolve and so we work with the approxi- 
mate -Ffc(g) here. 
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2.1. Effect of the sampling distribution fd 

In this section, we shall analytically study the effect of the sampling distri- 
bution fd on the degree distribution of the one-mode projection of the bipartite 
network. 

Delta function: Let fd be a delta fmiction of the form 

^(d,l^) = [ I ftheTwL 

If this delta function is convolved k times then the sum should be distributed 
as 

Therefore, Pu{q) exists only when q = fc(/i — 1) or k = q/{p — 1) and we have 
(also reported in (loj ) 

P""^^' ^ \ otherwise ^' ' 

Normal distribution: If fd is a normal distribution of the form N{^, cr^) then 
the sum of k random variables sampled from fd is again distributed as a normal 
distribution of the form N{k^, ka"^). Therefore, Fk{q) is given by 

Fkiq) ^ N{kfi-k,ka^) (8) 

If we substitute the density function for N we have 

Exponential distribution: If fd is an exponential distribution of the form E{X) 
where X = 1/ fi then the sum of the k random variables sampled from fd is 
known to take the form of a gamma distribution T(q; k, ii). Therefore, we have 

Fk{q) = T{q;k,^l-l) (10) 

Thus, we have (A' = l/{fi- 1)) 

f N x'>r^ exp(-A'g)(A'g)''-^ 
Puiq) = A }_^pk 

2.2. Choice of pk and illustration 

The framework presented above is applicable for any choice of pk . Literature 
presents two broad categories of bipartite networ ks ( a.) where both partitions 
grow 0, II] and (b) where one partition is fixed [ifl, 12 1. This second case is 
particularly interesting because it is appropriate to model discrete combinatorial 
systems (DCS) [l^]- A DCS consists of a finite set of elementary units (e.g.. 
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codons and letters/phonemes, i.e., U) that serves as its basic building blocks and 
the system, in turn, is a collection of a potentially infinite number of discrete 
combinations of these units (e.g., genes and languages, i.e., V). In this case, 
briefly, the stochastic model used to construct the bipartite network is as follows: 
at each time step t, a new node is introduced in the set V which preferentially 
connects itself to fi nodes in U. Let vt be the node added to V during the 
time step. Let denote the probability that a new node Vt entering V 

attaches itself to a node Ui € U, where fc* refers to the degree of the node Ui at 
time step t. A{kl) defines the attachment kernel and takes the form 

Ei=i(7fcj +1) 

where the sum in the denominator runs over all the nodes in U, and I/7 is the 
tunable model parameter which is usually referred to as the the initial attrac- 
tiveness [31 ■ Note that the higher the value of I/7, the higher the randomness 
in the system. 

[9, 10] shows that the emergent pk for the above model asymptotically ap- 
proaches a /3-distribution such that pk — M{k/t)'^ ^^(1 — k/t)"^^^ where 
77 = N/fj.'y and M is a normalization constant. Note that /3-distributions are 
more general than power-law distributions (noticed in expanding bipartite net- 
works 0, S|) since they can take different forms ranging from a normal distri- 
bution to a heavy-tailed distribution depending on the two parameters of the 
distribution. Therefore, we would illustrate the results of the equations with this 
type of a /3-distribution presented in [l^l ■ Figure [TJa) shows the cumulative 
degree distribution of the nodes in U in the bipartite network assuming that 
nodes in V arrive with degrees sampled from fd which can take the form of a 
(i) normal, (ii) delta, (iii) exponential and (iv) power-law distribution each with 
mean — 22). Note that we use the probability mass functions rather than 
the probability density functions (as in the theoretical analysis) for the simula- 
tion results reported in this figure. Further, note that the standard deviation 
(cr) of the normal distribution is controlled in such a way that the value of the 
random variable d is never negative. Figure [IJb) shows the degree distributions 
of the one-mode projections corresponding to the bipartite networks generated 
for Figure [IJa). The result clearly implies that the degree distribution of the 
one-mode projection varies depending on how the degrees of the nodes in V 
are distributed although the degree distribution remains unaffected for all the 
bipartite networks generated. Figure [llc)~(e) shows the match of the analyt- 
ical expressions (with appropriate normalization) derived with the respective 
stochastic simulations. Note that if fd is power-law distributed, the standard 
deviation a diverges and therefore an analytical study of this case is beyond the 
scope of the paper. In addition, no clear closed form solution for the convolution 
exists for this case. However, the stochastic simulation (Figure [IJb)) indicates 
that this choice results in an one-mode degree distribution that is quite different 
from the case where fd is constant. 
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Figure 1: Degree distribution of bipartite networks and the corresponding one- mode projec- 
tions in doubly-logarithmic scale. N = 1000, t = 1000 and 7 = 2. For stochastic simulations, 
the results are averaged over 100 runs. All the results are appropriately normalized, (a) 
Degree distributions of the nodes in U in the bipartite network generated through stochastic 
simulations when /^j is a (i) normal (fj, = 22, cr = 13), (ii) delta {fi = 22), (iii) exponential 
(fi = j- = 22) and (iv) power-law (exponent A = 1.16, fj. = 22, simulated within the interval 
[kmin = 1, kmax = 311]) distribution; (b) the degree distributions of the one-mode projections 
of the bipartite networks in (a); (c) match between stochastic simulations (green circles) and 
eq. (|9j (red line) with = 22, a = 13; black triangles indicate the case where is a constant 
with /i = 22; blue line shows how the result deteriorates when a is 100 times larger; (d) match 
between stochastic simulations (black circles) and eq. jTjl (red line) where /z = 22; (e) match 
between stochastic simulations (green circles) and eq. mil l (red line) where fi = 22; blue lines 
show the plot for eq. I I19II ; black triangles indicate the case where is a constant with fx = 22 
(given as a reference to show that even the approximate eq. I I19I I produces better results). 



2.3. Approximation hounds 

Here we discuss the limitations of the approximation that we made in eq. (j4]) 
by assuming that '^^'^^—'^i' — i. We shaU employ the GF formalism to find the 
necessary condition (in the asymptotic limits) for our approximation to hold. 
More precisely, we shall attempt to estimate the difference in the means (or 
the first moments) of the exact and the approximate expressions for p„ (q) and 
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discuss when this difference is neghgible which in turn serves as a necessary 
condition for the approximation to be vahd. We shall denote the generating 
function for the approximate expression of Pu{q) as gapp{x)- In this case, the 
GF encoding the probability that the node u is connected to a node in V of 
degree d is simply X^d-i fd-ix'^~^ which is f{x)/x and consequently, Fk{q) is 
given by {f{x)/x)''. Therefore, 

k 

Now we can calculate the first moments for the approximate and the exact Pu{q) 
by evaluating the derivatives of gapp{x) and g{x) respectively at a; = 1. We have 

5lpp(i) = ^pU{x)/x)U^, = {t/N)^l{^Ji - 1) (14) 

Similarly, 

.9(1) = ^p(/'(^)/m)U=i = {t/N)^l{^l + {t/N)<j^ (15) 

Thus, the mean of the approximate Pu{q) is smaller than the actual mean 
by (t/N)a'^ . Clearly, for cr = 0, the approximation gives us the exact solution, 
which is indeed the case for delta functions. Also, in the asymptotic limits, if 
N (with a scaling of l/i), the approximation holds good. However, as the 
value of (T increases the results start deteriorating (blue line in Figure [TJc)). 



fix) 



P{f{x)/x) 



(13) 



2.4- Closed- form expression 

Finally, it remains to be mentioned that in some special cases it is possible 
to derive a closed form expression for Pu{q)- If P/c^l-('z) takes up a very simple 
form then a closed form expression for pu {q) can be derived straight away. For 
instance, if in eq. PT|) . pk oc {k — 1)!, then one can easily show by changing the 
discrete sum to a continuous integral that 

P.(.) = ^^^^^[(A'.)-^-(A',)-] (16) 

There can be a second situation too. One can think oi pkFk{q) as a function F 
in q and k, i.e., PkFk{q) = F{q, k). If F(q, k) can be exactly (or approximately) 
factored into a form like F{q)F{k) then Pu{q) becomes 

p^{q) ^ F{q)Y,m (17) 

k 

Changing the sum in eq. (|17p to its continuous form we have 

/•oo 

Pu{q) = F{q) / F{k)dk = AF{q) (18) 



7 



where A is a constant. Thus, the nature of the resulting distribution is domi- 
nated by the function F{q). For instance, in case of exponentiahy distributed fd, 
with some algebraic manipulations and certain approximationf|j one can show 
that (blue line in Figure [TJe)) 



where EXP() is the exponential distribution function. 
3. Discussion 

In this paper, we identified that the degree distribution of the one-mode 
projection of a bipartite network onto the partition U is sensitive to the degree 
distribution of the other partition V. Further, we showed that if partition V 
corresponds to a peaked distribution then it is possible to derive closed form 
expression for the one-mode degree distribution. The derivation of the closed 
form solution for the one-mode degree distribution points to the fact that this 
distribution is not always reminiscent oi pk (i.e., the degree distribution of the 
nodes in U in the bipartite network) as has been demonstrated in the literature. 
While eq. ([TB| shows that this distribution could be a complex coupling of the 
terms k and q, eq. (jl9|) shows that it might be completely dominated by fd (i.e., 
the distribution of the node degrees in V in the bipartite network). We believe 
that this observation is an important departure from what have been reported 
so long in the literature. In addition, from our simulation results it is clear that 
the one-mode degree distribution is affected when the partition V is not peaked 
(see the power-law case in Figure [IJb) and the normal distribution case with 
high a in Figure [ijc)). These results indicate that as the standard deviation a 
becomes more and more arbitrary the effect on the one-mode degree distribution 
is more and more pronounced. Hence, an important future attempt would be 
to analytically solve for cases where fd is not peaked, i.e., has arbitrary /i, a. A 
final interesting and non-trivial direction could be to perform a similar analysis 
as done here but limited to the unweighted versions of the one- mode networks. 
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